field: force-inline 5x52 mul and sqr by l0rinc · Pull Request #1859 · bitcoin-core/secp256k1

l0rinc · 2026-05-30T13:00:44Z

Problem: The 5x52 field multiplication and squaring routines are hot in group arithmetic and scalar multiplication. Some compilers leave the thin wrappers and int128 inner helpers out of line, which keeps a call boundary in this hot path and limits scheduling of the 64x64->128 arithmetic.

Fix: Define SECP256K1_FORCE_INLINE next to the existing inline helper and use it for the 5x52 multiplication and squaring wrappers and int128 inner helpers.

For default optimized builds, this expands to __forceinline on MSVC-compatible compilers and to __attribute__((always_inline)) on GCC-compatible compilers. It falls back to the existing inline spelling when inlining is disabled, when optimization is disabled, when optimizing for size on GCC/Clang, or when _DEBUG is defined.

Benchmarks: Values are relative changes in Min(us), lower is better.

Source	Host / CPU	Compiler	ecdsa_verify	ecdh	schnorrsig_verify	field_sqr	field_mul
local	M4-Max.local	gcc-14 14.3.0	-9.1%	-9.0%	-9.6%	-7.0%	-4.0%
local	i9-ssd	GCC 16.1.0	-5.3%	-4.1%	-5.5%	-15.7%	-11.6%
local	WIN-A2EHOAU4JET / Xeon E5-2637 v2	MSVC 19.50.35728	-2.6%	-9.3%	-2.4%	-7.4%	-7.4%
local	i7-hdd	GCC 14.2.0	-10.9%	-11.1%	-10.5%	-9.4%	-21.6%
local	umbrel / Intel N150	GCC 12.2.0	-4.9%	-4.3%	-4.6%	+0.6%	-1.1%
local	rpi5-16-3	GCC 14.2.0	-0.6%	-0.7%	-0.6%	-5.5%	-1.0%
local	rpi4-2-1	GCC 14.2.0	-2.7%	-2.3%	-2.7%	-5.6%	-4.0%
local	nodl / Cortex-A53	GCC 11.4.0	-3.3%	-7.6%	-5.7%	-9.9%	-1.8%
andrewtoth	i9-14900HX	GCC 12.3	-5.3%	-4.2%	-5.6%	-1.5%	-6.1%
theStack	Snapdragon X Elite X1E-78-100	GCC 14.2.0	-11.2%	n/a	-11.1%	n/a	n/a
sipa	Ryzen 5950X	GCC 15.2.0	-11.4%	-10.4%	-8.4%	n/a	n/a

Tradeoffs: The speedups reproduce most consistently with GCC and MSVC. Clang was less consistently positive.

Inlining also increases code size:

Platform	Artifact	Before	After	Delta
macOS GCC	`libsecp256k1.a`	1,254,320	1,311,368	+57,048 (+4.55%)
Linux GCC	`libsecp256k1.a`	1,271,040	1,330,808	+59,768 (+4.70%)
Windows MSVC Release	`libsecp256k1-*.dll`	1,239,040	1,414,144	+175,104 (+14.13%)

Linux benchmarking script

BEFORE=8363a2d8d1b47857c437f7cf22bd11ab06c7c50f; AFTER=33b1b9c455eb2bb07eded939b36abc49859d2ccf; CC=gcc; \
API_ITERS=10000; INT_ITERS=200000; JOBS=1; \
BH=$(git rev-parse --short=12 "$BEFORE") && AH=$(git rev-parse --short=12 "$AFTER") && \
RUN=$(date +%Y%m%d%H%M%S) && \
ROOT="$PWD/.bench-builds/gcc-$BH-$AH-$RUN" && \
RAW="$PWD/.bench-results/secp-bench-gcc-$BH-$AH-$RUN.txt" && \
(set -e; \
  mkdir -p "$ROOT" "$(dirname "$RAW")"; \
  printf "host: %s, compiler: %s\n" "$(hostname)" "$("$CC" --version | sed -n '1p')" | tee "$RAW" >&2; \
  old=$(git symbolic-ref --short -q HEAD || git rev-parse HEAD); \
  trap 'git switch -q "$old" 2>/dev/null || git switch -q --detach "$old"' EXIT; \
  for side in before after; do \
    ref=$([ "$side" = before ] && printf %s "$BEFORE" || printf %s "$AFTER"); \
    git cat-file -e "$ref^{commit}" 2>/dev/null || git fetch -q origin "$ref"; \
    h=$(git rev-parse --short=12 "$ref"); \
    b="$ROOT/$side-$h"; \
    echo "== $side $h ==" >&2; \
    git switch -q --detach "$ref"; \
    cmake -S . -B "$b" -DCMAKE_C_COMPILER="$CC" -DCMAKE_BUILD_TYPE=Release -DBUILD_SHARED_LIBS=OFF -DSECP256K1_BUILD_BENCHMARK=ON -DSECP256K1_BUILD_TESTS=OFF -DSECP256K1_BUILD_EXHAUSTIVE_TESTS=OFF -DSECP256K1_BUILD_CTIME_TESTS=OFF -DSECP256K1_BUILD_EXAMPLES=OFF -DSECP256K1_ENABLE_MODULE_MUSIG=OFF -DSECP256K1_VALGRIND=OFF >> "$RAW" 2>&1; \
    cmake --build "$b" -j "$JOBS" --target bench bench_internal >> "$RAW" 2>&1; \
    echo "=== $side $ref $h ===" >> "$RAW"; \
    SECP256K1_BENCH_ITERS=$API_ITERS "$b/bin/bench" ecdsa ec ecdh schnorrsig ellswift >> "$RAW"; \
    SECP256K1_BENCH_ITERS=$INT_ITERS "$b/bin/bench_internal" field group ecmult hash context >> "$RAW"; \
  done; \
  awk -F, '/^=== /{split($0,p," "); side=p[2]; next} /^[[:alnum:]_][[:alnum:]_]*[[:space:]]*,/{name=$1; val=$2+0; gsub(/^[[:space:]]+|[[:space:]]+$/,"",name); if(name!="Benchmark"){if(!(name in seen)){seen[name]=1; order[++n]=name} x[side,name]=val}} END{print "Benchmark\tBefore min(us)\tAfter min(us)\tDelta"; for(i=1;i<=n;i++){name=order[i]; b=x["before",name]; a=x["after",name]; if(b&&a) printf "%s\t%.6g\t%.6g\t%+.1f%%\n",name,b,a,100*(a-b)/b}}' "$RAW" | column -t -s $'\t'; \
  echo "raw: $RAW" >&2)

Linux size comparison script

BEFORE=8363a2d8d1b47857c437f7cf22bd11ab06c7c50f; AFTER=33b1b9c455eb2bb07eded939b36abc49859d2ccf; CC=gcc; JOBS=1; \
BH=$(git rev-parse --short=12 "$BEFORE"); AH=$(git rev-parse --short=12 "$AFTER"); RUN=$(date +%Y%m%d%H%M%S); ROOT="$PWD/.size-builds/gcc-$BH-$AH-$RUN"; \
(set -e; old=$(git symbolic-ref --short -q HEAD || git rev-parse HEAD); trap 'git switch -q "$old" 2>/dev/null || git switch -q --detach "$old"' EXIT; \
printf "host: %s, compiler: %s\n" "$(hostname)" "$("$CC" --version | sed -n '1p')"; \
for side in before after; do \
  ref=$([ "$side" = before ] && printf %s "$BEFORE" || printf %s "$AFTER"); git cat-file -e "$ref^{commit}" 2>/dev/null || git fetch -q origin "$ref"; h=$(git rev-parse --short=12 "$ref"); b="$ROOT/$side-$h"; \
  git switch -q --detach "$ref"; \
  cmake -S . -B "$b" -DCMAKE_C_COMPILER="$CC" -DCMAKE_BUILD_TYPE=Release -DBUILD_SHARED_LIBS=OFF -DSECP256K1_BUILD_BENCHMARK=OFF -DSECP256K1_BUILD_TESTS=OFF -DSECP256K1_BUILD_EXHAUSTIVE_TESTS=OFF -DSECP256K1_BUILD_CTIME_TESTS=OFF -DSECP256K1_BUILD_EXAMPLES=OFF -DSECP256K1_ENABLE_MODULE_MUSIG=OFF -DSECP256K1_VALGRIND=OFF >/dev/null; \
  cmake --build "$b" -j "$JOBS" --target secp256k1 >/dev/null; \
  lib=$(find "$b" -name 'libsecp256k1.a' -print -quit); \
  bytes=$(wc -c < "$lib" | tr -d ' '); \
  printf "%s\t%s\t%s\n" "$side" "$h" "$bytes"; \
  done | awk 'BEGIN{print "Side\tCommit\tlibsecp256k1.a bytes"} {print; size[$1]=$3} END{if(size["before"]&&size["after"]) printf "Delta\t\t%+d bytes (%+.2f%%)\n",size["after"]-size["before"],100*(size["after"]-size["before"])/size["before"]}' | column -t -s $'\t')

host: M4-Max.local, compiler: gcc-14 (Homebrew GCC 14.3.0) 14.3.0

Benchmark                 Before min(us)  After min(us)  Delta
ecdsa_verify              17.5            15.9           -9.1%
ecdsa_sign                12.3            12.1           -1.6%
ec_keygen                 8.07            7.77           -3.7%
ecdh                      16.6            15.1           -9.0%
schnorrsig_sign           8.6             8.29           -3.6%
schnorrsig_verify         17.8            16.1           -9.6%
ellswift_encode           11.1            11.1           +0.0%
ellswift_decode           4.68            4.69           +0.2%
ellswift_keygen           19.4            19.1           -1.5%
ellswift_ecdh             18.5            17.1           -7.6%
field_half                0.00154         0.00155        +0.6%
field_normalize           0.00665         0.00672        +1.1%
field_normalize_weak      0.00291         0.00291        +0.0%
field_sqr                 0.00871         0.0081         -7.0%
field_mul                 0.00969         0.0093         -4.0%
field_inverse             1.57            1.58           +0.6%
field_inverse_var         0.735           0.742          +1.0%
field_is_square_var       0.994           1              +0.6%
field_sqrt                2.21            2.22           +0.5%
group_double_var          0.0502          0.0447         -11.0%
group_add_var             0.126           0.11           -12.7%
group_add_affine          0.1             0.0922         -7.8%
group_add_affine_var      0.0887          0.077          -13.2%
group_add_zinv_var        0.106           0.0902         -14.9%
group_to_affine_var       0.774           0.774          +0.0%
ecmult_wnaf               0.334           0.334          +0.0%
hash_sha256               0.12            0.12           +0.0%
hash_hmac_sha256          0.464           0.463          -0.2%
hash_rfc6979_hmac_sha256  2.55            2.55           +0.0%
context_create            1.96            1.96           +0.0%

Side    Commit                 libsecp256k1.a bytes
before  8363a2d8d1b4           1254320
after   33b1b9c455eb           1311368
Delta   +57048 bytes (+4.55%)

host: WIN-A2EHOAU4JET (Intel(R) Xeon(R) CPU E5-2637 v2 @ 3.50GHz), system: Microsoft Windows NT 10.0.20348.0, compiler: Microsoft (R) C/C++ Optimizing Compiler Version 19.50.35728 for x64

Benchmark                    Before min(us) After min(us)    Delta
ecdsa_verify                           74.1          72.2    -2.6%
ecdsa_sign                             43.3          41.4    -4.4%
ec_keygen                              32.3            30    -7.1%
ecdh                                     75            68    -9.3%
schnorrsig_sign                        34.1            32    -6.2%
schnorrsig_verify                      74.9          73.1    -2.4%
ellswift_encode                        32.3          32.5    +0.6%
ellswift_decode                        14.4          14.6    +1.4%
ellswift_keygen                        64.6          62.9    -2.6%
ellswift_ecdh                          80.2          73.7    -8.1%
field_half                          0.00378       0.00378    +0.0%
field_normalize                      0.0114        0.0114    +0.0%
field_normalize_weak                0.00389       0.00389    +0.0%
field_sqr                            0.0272        0.0252    -7.4%
field_mul                            0.0394        0.0365    -7.4%
field_inverse                          3.27          3.29    +0.6%
field_inverse_var                      2.07          2.11    +1.9%
field_is_square_var                     2.7          2.67    -1.1%
field_sqrt                             7.47          6.98    -6.6%
group_double_var                      0.245         0.207   -15.5%
group_add_var                           0.6         0.525   -12.5%
group_add_affine                      0.465         0.405   -12.9%
group_add_affine_var                  0.418         0.358   -14.4%
group_add_zinv_var                    0.458         0.403   -12.0%
group_to_affine_var                    2.25          2.26    +0.4%
ecmult_wnaf                            0.58          0.59    +1.7%
hash_sha256                           0.332         0.333    +0.3%
hash_hmac_sha256                       1.31          1.31    +0.0%
hash_rfc6979_hmac_sha256               7.23           7.2    -0.4%
context_create                         3.32          3.34    +0.6%

Side     Commit          DLL bytes
before   8363a2d8d1b4      1239040
after    a37e34e187da      1414144
Delta                      175104 (+14.13%)

host: i9-ssd, compiler: gcc (GCC) 16.1.0

Benchmark                 Before min(us)  After min(us)  Delta
ecdsa_verify              39.6            37.5           -5.3%
ecdsa_sign                27.1            26.4           -2.6%
ec_keygen                 18.2            17.5           -3.8%
ecdh                      39              37.4           -4.1%
schnorrsig_sign           19.5            18.7           -4.1%
schnorrsig_verify         40.3            38.1           -5.5%
ellswift_encode           20.1            19.9           -1.0%
ellswift_decode           8.59            8.46           -1.5%
ellswift_keygen           38.2            37.3           -2.4%
ellswift_ecdh             43.4            40.9           -5.8%
field_half                0.00275         0.00275        +0.0%
field_normalize           0.00995         0.00994        -0.1%
field_normalize_weak      0.00378         0.00378        +0.0%
field_sqr                 0.0178          0.015          -15.7%
field_mul                 0.019           0.0168         -11.6%
field_inverse             2.41            2.39           -0.8%
field_inverse_var         1.32            1.28           -3.0%
field_is_square_var       1.69            1.68           -0.6%
field_sqrt                4.21            4.16           -1.2%
group_double_var          0.121           0.115          -5.0%
group_add_var             0.309           0.272          -12.0%
group_add_affine          0.248           0.231          -6.9%
group_add_affine_var      0.216           0.194          -10.2%
group_add_zinv_var        0.245           0.213          -13.1%
group_to_affine_var       1.41            1.36           -3.5%
ecmult_wnaf               0.536           0.581          +8.4%
hash_sha256               0.29            0.286          -1.4%
hash_hmac_sha256          1.14            1.13           -0.9%
hash_rfc6979_hmac_sha256  6.3             6.21           -1.4%
context_create            2.68            2.68           +0.0%

Side    Commit        libsecp256k1.a bytes
before  8363a2d8d1b4  1271040
after   33b1b9c455eb  1330808
Delta                 +59768 bytes (+4.70%)

host: i7-hdd, compiler: gcc (Ubuntu 14.2.0-19ubuntu2) 14.2.0

Benchmark                 Before min(us)  After min(us)  Delta
ecdsa_verify              43.1            38.4           -10.9%
ecdsa_sign                28.4            27.3           -3.9%
ec_keygen                 19.3            18             -6.7%
ecdh                      43.2            38.4           -11.1%
schnorrsig_sign           20.6            19.4           -5.8%
schnorrsig_verify         43.7            39.1           -10.5%
ellswift_encode           19.9            19.7           -1.0%
ellswift_decode           8.48            8.41           -0.8%
ellswift_keygen           39.2            37.8           -3.6%
ellswift_ecdh             46.4            41.8           -9.9%
field_half                0.00275         0.00275        +0.0%
field_normalize           0.00998         0.00998        +0.0%
field_normalize_weak      0.00402         0.00402        +0.0%
field_sqr                 0.017           0.0154         -9.4%
field_mul                 0.0218          0.0171         -21.6%
field_inverse             2.49            2.46           -1.2%
field_inverse_var         1.36            1.35           -0.7%
field_is_square_var       1.66            1.67           +0.6%
field_sqrt                4.07            4.07           +0.0%
group_double_var          0.132           0.119          -9.8%
group_add_var             0.346           0.28           -19.1%
group_add_affine          0.266           0.236          -11.3%
group_add_affine_var      0.243           0.201          -17.3%
group_add_zinv_var        0.265           0.216          -18.5%
group_to_affine_var       1.46            1.44           -1.4%
ecmult_wnaf               0.554           0.604          +9.0%
hash_sha256               0.305           0.298          -2.3%
hash_hmac_sha256          1.18            1.17           -0.8%
hash_rfc6979_hmac_sha256  6.47            6.43           -0.6%
context_create            2.73            2.71           -0.7%

host: rpi5-16-3, compiler: gcc (Ubuntu 14.2.0-19ubuntu2) 14.2.0

Benchmark                 Before min(us)  After min(us)  Delta
ecdsa_verify              157             156            -0.6%
ecdsa_sign                69.5            69.3           -0.3%
ec_keygen                 57.6            57.5           -0.2%
ecdh                      149             148            -0.7%
schnorrsig_sign           59.3            59             -0.5%
schnorrsig_verify         158             157            -0.6%
ellswift_encode           44.9            44.8           -0.2%
ellswift_decode           24.2            24.2           +0.0%
ellswift_keygen           103             102            -1.0%
ellswift_ecdh             154             154            +0.0%
field_half                0.00334         0.00334        +0.0%
field_normalize           0.0143          0.0144         +0.7%
field_normalize_weak      0.00543         0.00543        +0.0%
field_sqr                 0.0654          0.0618         -5.5%
field_mul                 0.0919          0.091          -1.0%
field_inverse             4.8             4.78           -0.4%
field_inverse_var         2.24            2.24           +0.0%
field_is_square_var       2.31            2.31           +0.0%
field_sqrt                17              17             +0.0%
group_double_var          0.526           0.525          -0.2%
group_add_var             1.35            1.34           -0.7%
group_add_affine          0.988           0.984          -0.4%
group_add_affine_var      0.926           0.915          -1.2%
group_add_zinv_var        1.02            1.01           -1.0%
group_to_affine_var       2.6             2.6            +0.0%
ecmult_wnaf               0.606           0.614          +1.3%
hash_sha256               0.316           0.315          -0.3%
hash_hmac_sha256          1.2             1.2            +0.0%
hash_rfc6979_hmac_sha256  6.62            6.62           +0.0%
context_create            4.18            4.18           +0.0%

host: rpi4-2-1, compiler: gcc (Ubuntu 14.2.0-19ubuntu2) 14.2.0

Benchmark                 Before min(us)  After min(us)  Delta
ecdsa_verify              222             216            -2.7%
ecdsa_sign                111             109            -1.8%
ec_keygen                 90.4            88.6           -2.0%
ecdh                      216             211            -2.3%
schnorrsig_sign           93.4            91.4           -2.1%
schnorrsig_verify         224             218            -2.7%
ellswift_encode           64.2            64.1           -0.2%
ellswift_decode           33.5            33.5           +0.0%
ellswift_keygen           156             153            -1.9%
ellswift_ecdh             226             220            -2.7%
field_half                0.00447         0.00447        +0.0%
field_normalize           0.0215          0.0215         +0.0%
field_normalize_weak      0.00783         0.00783        +0.0%
field_sqr                 0.0871          0.0822         -5.6%
field_mul                 0.126           0.121          -4.0%
field_inverse             8.54            8.54           +0.0%
field_inverse_var         3.25            3.25           +0.0%
field_is_square_var       3.57            3.57           +0.0%
field_sqrt                22.7            22.6           -0.4%
group_double_var          0.72            0.71           -1.4%
group_add_var             1.87            1.8            -3.7%
group_add_affine          1.4             1.36           -2.9%
group_add_affine_var      1.3             1.24           -4.6%
group_add_zinv_var        1.42            1.37           -3.5%
group_to_affine_var       3.76            3.75           -0.3%
ecmult_wnaf               1.06            1.05           -0.9%
hash_sha256               0.532           0.531          -0.2%
hash_hmac_sha256          2.02            2.02           +0.0%
hash_rfc6979_hmac_sha256  11.2            11.2           +0.0%
context_create            6.8             6.8            +0.0%

host: umbrel (Intel(R) N150), compiler: gcc (Debian 12.2.0-14+deb12u1) 12.2.0

Benchmark                 Before min(us)  After min(us)  Delta
ecdsa_verify              371             353            -4.9%
ecdsa_sign                163             160            -1.8%
ec_keygen                 129             123            -4.7%
ecdh                      347             332            -4.3%
schnorrsig_sign           131             126            -3.8%
schnorrsig_verify         373             356            -4.6%
ellswift_encode           143             142            -0.7%
ellswift_decode           71.3            70.8           -0.7%
ellswift_keygen           272             268            -1.5%
ellswift_ecdh             367             352            -4.1%
field_half                0.0124          0.0124         +0.0%
field_normalize           0.0439          0.0439         +0.0%
field_normalize_weak      0.0192          0.0192         +0.0%
field_sqr                 0.168           0.169          +0.6%
field_mul                 0.182           0.18           -1.1%
field_inverse             11.2            11.2           +0.0%
field_inverse_var         8.44            8.4            -0.5%
field_is_square_var       9.56            9.55           -0.1%
field_sqrt                45              44             -2.2%
group_double_var          1.25            1.18           -5.6%
group_add_var             2.92            2.68           -8.2%
group_add_affine          2.22            2.12           -4.5%
group_add_affine_var      2.02            1.86           -7.9%
group_add_zinv_var        2.21            2.01           -9.0%
group_to_affine_var       9.25            9.13           -1.3%
ecmult_wnaf               2.51            2.45           -2.4%
hash_sha256               1.13            1.12           -0.9%
hash_hmac_sha256          4.44            4.44           +0.0%
hash_rfc6979_hmac_sha256  24.4            24.4           +0.0%
context_create            14.2            14.1           -0.7%

host: nodl (Cortex-A53), compiler: gcc (Ubuntu 11.4.0-1ubuntu1~22.04.3) 11.4.0

Benchmark                 Before min(us)  After min(us)  Delta
ecdsa_verify              632             611            -3.3%
ecdsa_sign                308             291            -5.5%
ec_keygen                 228             212            -7.0%
ecdh                      633             585            -7.6%
schnorrsig_sign           231             221            -4.3%
schnorrsig_verify         630             594            -5.7%
ellswift_encode           156             156            +0.0%
ellswift_decode           80              76.1           -4.9%
ellswift_keygen           438             455            +3.9%
ellswift_ecdh             613             599            -2.3%
field_half                0.0106          0.00985        -7.1%
field_normalize           0.0483          0.0499         +3.3%
field_normalize_weak      0.0173          0.0173         +0.0%
field_sqr                 0.202           0.182          -9.9%
field_mul                 0.278           0.273          -1.8%
field_inverse             21.3            21.1           -0.9%
field_inverse_var         7.67            7.48           -2.5%
field_is_square_var       8.73            8.91           +2.1%
field_sqrt                65.8            61.9           -5.9%
group_double_var          2.04            1.9            -6.9%
group_add_var             5.35            5.09           -4.9%
group_add_affine          3.93            3.51           -10.7%
group_add_affine_var      3.56            3.32           -6.7%
group_add_zinv_var        3.94            3.65           -7.4%
group_to_affine_var       9.56            10.4           +8.8%
ecmult_wnaf               2.37            2.48           +4.6%
hash_sha256               1.13            1.19           +5.3%
hash_hmac_sha256          5.08            4.76           -6.3%
hash_rfc6979_hmac_sha256  33.3            31.2           -6.3%
context_create            19.3            18.8           -2.6%

Reviewer measurements

andrewtoth, i9-14900HX, GCC 12.3

Benchmark                 Before min(us)  After min(us)  Delta
ecdsa_verify              22.7            21.5           -5.3%
ecdsa_sign                14.3            14.0           -2.1%
ec_keygen                 9.90            9.54           -3.6%
ecdh                      21.6            20.7           -4.2%
schnorrsig_sign           10.6            10.2           -3.8%
schnorrsig_verify         23.1            21.8           -5.6%
ellswift_ecdh             23.8            22.7           -4.6%
field_sqr                 0.00912         0.00898        -1.5%
field_mul                 0.0114          0.0107         -6.1%
field_inverse             1.23            1.24           +0.8%
field_inverse_var         0.770           0.773          +0.4%
field_is_square_var       1.06            1.05           -0.9%
field_sqrt                2.82            2.46           -12.8%
group_double_var          0.0701          0.0612         -12.7%
group_add_var             0.168           0.153          -8.9%
group_add_affine          0.132           0.123          -6.8%
group_add_affine_var      0.120           0.103          -14.2%
group_add_zinv_var        0.138           0.117          -15.2%
group_to_affine_var       0.820           0.819          -0.1%

theStack, Snapdragon X Elite X1E-78-100, GCC 14.2.0

Benchmark                 Before min(us)  After min(us)  Delta
ecdsa_verify              24.1            21.4           -11.2%
ecdsa_sign                19.0            18.5           -2.6%
schnorrsig_sign           13.0            12.7           -2.3%
schnorrsig_verify         24.4            21.7           -11.1%

Bitcoin Core subtree bench_bitcoin -filter=VerifyScript.*:

Benchmark                   Before ns/script  After ns/script  Delta
VerifyScriptP2TR_KeyPath    23679.52          20899.66         -11.7%
VerifyScriptP2TR_ScriptPath 43430.71          39280.19         -9.6%
VerifyScriptP2WPKH          23526.82          20870.22         -11.3%

sipa, Ryzen 5950X, GCC 15.2.0

Benchmark                 Before min(us)  After min(us)  Delta
ecdsa_verify              30.8            27.3           -11.4%
ecdsa_sign                18.7            17.2           -8.0%
ec_keygen                 13.6            12.2           -10.3%
ecdh                      29.8            26.7           -10.4%
ecdsa_recover             31.0            28.2           -9.0%
schnorrsig_sign           14.4            13.0           -9.7%
schnorrsig_verify         31.1            28.5           -8.4%
ellswift_encode           13.2            13.4           +1.5%
ellswift_decode           5.79            5.84           +0.9%
ellswift_keygen           26.8            25.7           -4.1%
ellswift_ecdh             32.1            29.6           -7.8%

clang:

host: i9-ssd, compiler: Ubuntu clang version 22.1.6 (++20260508084839+c0262e742787-1~exp1~20260508204859.77)

Benchmark                 Before min(us)  After min(us)  Delta
ecdsa_verify              40.1            39.8           -0.7%
ecdsa_sign                29.2            29.1           -0.3%
ec_keygen                 19.5            19.6           +0.5%
ecdh                      40.3            39.8           -1.2%
schnorrsig_sign           21              20.9           -0.5%
schnorrsig_verify         40.6            40.3           -0.7%
ellswift_encode           20.1            20.1           +0.0%
ellswift_decode           8.43            8.41           -0.2%
ellswift_keygen           39.8            39.7           -0.3%
ellswift_ecdh             44.1            43.5           -1.4%
field_half                0.0028          0.0028         +0.0%
field_normalize           0.00889         0.00891        +0.2%
field_normalize_weak      0.0037          0.0037         +0.0%
field_sqr                 0.0144          0.0144         +0.0%
field_mul                 0.021           0.019          -9.5%
field_inverse             2.6             2.64           +1.5%
field_inverse_var         1.34            1.35           +0.7%
field_is_square_var       1.73            1.73           +0.0%
field_sqrt                3.95            3.96           +0.3%
group_double_var          0.128           0.125          -2.3%
group_add_var             0.311           0.31           -0.3%
group_add_affine          0.243           0.242          -0.4%
group_add_affine_var      0.207           0.207          +0.0%
group_add_zinv_var        0.229           0.228          -0.4%
group_to_affine_var       1.43            1.44           +0.7%
ecmult_wnaf               0.536           0.598          +11.6%
hash_sha256               0.3             0.299          -0.3%
hash_hmac_sha256          1.18            1.18           +0.0%
hash_rfc6979_hmac_sha256  6.51            6.53           +0.3%
context_create            2.16            2.15           -0.5%

reindex-chainstate:

for DBCACHE in 5000; do \
    COMMITS="67250b1d97e6159d908ef44639b6a12471e7c717 c264526415f38afb9890003003b7de39b370b745"; \
    STOP=950059; CC=gcc; CXX=g++; \
    BASE_DIR="/mnt/my_storage"; DATA_DIR="$BASE_DIR/BitcoinData"; LOG_DIR="$BASE_DIR/logs"; \
    (echo ""; for c in $COMMITS; do git fetch -q origin "$c" 2>/dev/null || true; git log -1 --pretty='%h %s' $c || exit 1; done) && \
    (echo "" && echo "$(date -I) | reindex-chainstate | ${STOP} blocks | dbcache ${DBCACHE} | $(hostname) | $(uname -m) | $(lscpu | grep 'Model name' | head -1 | cut -d: -f2 | xargs) | $(nproc) cores | $(free -h | awk '/^Mem:/{print $2}') RAM | $(l
sblk -no ROTA $(df --output=source $BASE_DIR | tail -1) | grep -q 1 && echo HDD || echo SSD)"; echo "") && \
    hyperfine \
    --sort command \
    --runs 1 \
    --export-json "$BASE_DIR/rdx-$(sed -E 's/[^ ]+/\L&/g;s/[.]/_/g;s/ /-/g'<<<"$COMMITS")-$STOP-$DBCACHE-$CC.json" \
    --parameter-list COMMIT ${COMMITS// /,} \
    --prepare "killall -9 bitcoind 2>/dev/null; rm -f ./build/bin/bitcoind; git clean -fxd; git reset --hard {COMMIT} && \
      cmake -B build -G Ninja -DCMAKE_BUILD_TYPE=Release && ninja -C build bitcoind -j1 && \
      ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=$STOP -dbcache=1000 -printtoconsole=0; sleep 20; rm -f $DATA_DIR/debug.log; rm -rfd $DATA_DIR/indexes;" \
    --conclude "killall bitcoind || true; sleep 5; grep -q 'height=0' $DATA_DIR/debug.log && grep -q 'Disabling script verification at block #1' $DATA_DIR/debug.log && grep -q 'height=$STOP' $DATA_DIR/debug.log && grep 'Bitcoin Core version' $DATA_
DIR/debug.log | grep -q \"\$(git rev-parse --short=12 {COMMIT})\"; \
                cp $DATA_DIR/debug.log $LOG_DIR/debug-{COMMIT}-$(date +%s).log" \
    "COMPILER=$CC ./build/bin/bitcoind -datadir=$DATA_DIR -stopatheight=$STOP -dbcache=$DBCACHE -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 -assumevalid=0"; \
done

67250b1d97 parallel input fetcher
c264526415 Refactor: optimize scalar reduction and arithmetic functions.

2026-05-28 | reindex-chainstate | 950059 blocks | dbcache 5000 | i9-ssd | x86_64 | Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz | 16 cores | 62Gi RAM | SSD

Benchmark 1: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=950059 -dbcache=5000 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 -assumevalid=0 (COMMIT = 67250b1d97e6159d908ef44639b6a12471e7c717)
  Time (abs ≡):        37155.108 s               [User: 375835.978 s, System: 978.929 s]

Benchmark 2: COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=950059 -dbcache=5000 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 -assumevalid=0 (COMMIT = c264526415f38afb9890003003b7de39b370b745)
  Time (abs ≡):        36261.785 s               [User: 362247.387 s, System: 1002.867 s]

Relative speed comparison
        1.02          COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=950059 -dbcache=5000 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 -assumevalid=0 (COMMIT = 67250b1d97e6159d908ef44639b6a12471e7c717)
        1.00          COMPILER=gcc ./build/bin/bitcoind -datadir=/mnt/my_storage/BitcoinData -stopatheight=950059 -dbcache=5000 -reindex-chainstate -blocksonly -connect=0 -printtoconsole=0 -assumevalid=0 (COMMIT = c264526415f38afb9890003003b7de39b370b745)

andrewtoth · 2026-06-01T15:41:42Z

Ran the benchmarks on i9-14900HX built with GCC 12.3, confirmed the speedups:

Results (Min us, lower is better)

Benchmark	Before	After	Delta
ecdsa_verify	22.7	21.5	-5.3%
ecdsa_sign	14.3	14.0	-2.1%
ec_keygen	9.90	9.54	-3.6%
ecdh	21.6	20.7	-4.2%
schnorrsig_sign	10.6	10.2	-3.8%
schnorrsig_verify	23.1	21.8	-5.6%
ellswift_ecdh	23.8	22.7	-4.6%
field_sqr	0.00912	0.00898	-1.5%
field_mul	0.0114	0.0107	-6.1%
field_inverse	1.23	1.24	+0.8%
field_inverse_var	0.770	0.773	+0.4%
field_is_square_var	1.06	1.05	-0.9%
field_sqrt	2.82	2.46	-12.8%
group_double_var	0.0701	0.0612	-12.7%
group_add_var	0.168	0.153	-8.9%
group_add_affine	0.132	0.123	-6.8%
group_add_affine_var	0.120	0.103	-14.2%
group_add_zinv_var	0.138	0.117	-15.2%
group_to_affine_var	0.820	0.819	-0.1%

theStack · 2026-06-02T01:14:52Z

Seeing a ~12.5% speedup for both ECDSA and Schnorr verification and ~3% for signing on my arm64 machine (Snapdragon X Elite - X1E-78-100), using GCC 14.2.0:

master:

$ ./build/bin/bench verify sign
Benchmark                     ,    Min(us)    ,    Avg(us)    ,    Max(us)    

ecdsa_verify                  ,    24.1       ,    24.1       ,    24.5    
ecdsa_sign                    ,    19.0       ,    19.0       ,    19.1    
schnorrsig_sign               ,    13.0       ,    13.1       ,    13.2    
schnorrsig_verify             ,    24.4       ,    24.4       ,    24.5

PR:

$ ./build/bin/bench verify sign
Benchmark                     ,    Min(us)    ,    Avg(us)    ,    Max(us)    

ecdsa_verify                  ,    21.4       ,    21.4       ,    21.8    
ecdsa_sign                    ,    18.5       ,    18.5       ,    18.5    
schnorrsig_sign               ,    12.7       ,    12.7       ,    12.7    
schnorrsig_verify             ,    21.7       ,    21.7       ,    21.7

Applying this change to the Bitcoin Core secp256k1 subtree (Branch apply-secp-pr1859) shows the speedup in the script verification benchmarks as well (run via ./build/bin/bench_bitcoin -filter=VerifyScript.*):

master (commit theStack/bitcoin@654a522):

ns/script	script/s	err%	total	benchmark
23,679.52	42,230.58	0.3%	0.01	`VerifyScriptP2TR_KeyPath`
43,430.71	23,025.18	0.4%	0.01	`VerifyScriptP2TR_ScriptPath`
23,526.82	42,504.68	0.3%	0.01	`VerifyScriptP2WPKH`

PR applied (commit theStack/bitcoin@494a473):

ns/script	script/s	err%	total	benchmark
20,899.66	47,847.67	0.3%	0.01	`VerifyScriptP2TR_KeyPath`
39,280.19	25,458.13	0.8%	0.01	`VerifyScriptP2TR_ScriptPath`
20,870.22	47,915.16	0.6%	0.01	`VerifyScriptP2WPKH`

real-or-random · 2026-06-02T07:09:05Z

Concept ACK

That's a very interesting observation. So far, we tried to stay away from guiding the compiler too much, but the ratio of added complexity vs. gains here is pretty good.

@l0rinc What I always wanted to try is profile-guided optimizations, e.g., where the profile is generated in a benchmark run that only performs signature verification (this could even be done automatically as part of the build process). I imagine there could be more low-hanging fruits. Would you be interested in looking into this stuff as well?

The 5x52 field multiplication and squaring routines are hot in group arithmetic and scalar multiplication. Use the new `SECP256K1_FORCE_INLINE` for the thin wrappers and `int128` inner helpers so compilers can schedule the 64x64->128 arithmetic without a call boundary. The helper uses forced inlining in optimized release-style builds, but falls back to `SECP256K1_INLINE` when no-inline, size optimization, or debug-style macros ask not to force it. Across the measured GCC and MSVC Release builds, this improves ECDSA verification by 0.6% to 9.1%, ECDH by 0.7% to 9.3%, and Schnorr verification by 0.6% to 9.6%. The direct field benchmarks generally show the intended effect on field squaring and multiplication, while Clang results are mostly flat and less consistently positive. This is a code-size tradeoff: the tested static library builds grew by about 4.6% to 4.7%, and the tested Windows Release DLL grew by 14.1%. Co-authored-by: Sebastian Falbesoner <sebastian.falbesoner@gmail.com>

sipa · 2026-06-02T13:40:16Z

Concept ACK.

Master:

Benchmark                     ,    Min(us)    ,    Avg(us)    ,    Max(us)    

ecdsa_verify                  ,    30.8       ,    31.1       ,    33.4    
ecdsa_sign                    ,    18.7       ,    18.7       ,    18.7    
ec_keygen                     ,    13.6       ,    13.6       ,    13.6    
ecdh                          ,    29.8       ,    29.9       ,    29.9    
ecdsa_recover                 ,    31.0       ,    32.3       ,    34.4    
schnorrsig_sign               ,    14.4       ,    14.4       ,    14.4    
schnorrsig_verify             ,    31.1       ,    31.1       ,    31.2    
ellswift_encode               ,    13.2       ,    13.2       ,    13.3    
ellswift_decode               ,     5.79      ,     5.80      ,     5.82   
ellswift_keygen               ,    26.8       ,    26.8       ,    26.8    
ellswift_ecdh                 ,    32.1       ,    32.1       ,    32.2

This PR:

Benchmark                     ,    Min(us)    ,    Avg(us)    ,    Max(us)    

ecdsa_verify                  ,    27.3       ,    28.2       ,    30.3    
ecdsa_sign                    ,    17.2       ,    17.9       ,    19.4    
ec_keygen                     ,    12.2       ,    12.3       ,    12.3    
ecdh                          ,    26.7       ,    26.8       ,    26.8    
ecdsa_recover                 ,    28.2       ,    28.2       ,    28.3    
schnorrsig_sign               ,    13.0       ,    13.4       ,    14.9    
schnorrsig_verify             ,    28.5       ,    28.7       ,    28.8    
ellswift_encode               ,    13.4       ,    13.5       ,    13.5    
ellswift_decode               ,     5.84      ,     5.87      ,     5.91   
ellswift_keygen               ,    25.7       ,    25.9       ,    26.2    
ellswift_ecdh                 ,    29.6       ,    29.6       ,    29.9

(GCC 15.2.0 on Ryzen 5950X)

real-or-random

utACK 1c537ab

hebasto

Concept ACK.

hebasto · 2026-06-13T18:00:24Z

 #  define SECP256K1_INLINE inline
 # endif

+# if !defined(_DEBUG) && !defined(__NO_INLINE__) && !defined(__OPTIMIZE_SIZE__)


Because Microsoft's cl.exe defines neither the __OPTIMIZE_SIZE__ nor the __OPTIMIZE__ macro, building with cmake --build build --config MinSizeRel will still result in __forceinline being used.

Yes, see #1859 (comment)

Do you think we should change anything here?

We could manually define __OPTIMIZE_SIZE__ for the "MinSizeRel" configuration on Windows in the build system. It's not a huge deal, though, since we recommend using clang-cl.exe for Windows builds anyway.

hebasto

The Bitcoin Core project has a similar macro named ALWAYS_INLINE. Could we adopt SECP256K1_ALWAYS_INLINE here for consistency across the two closely related projects?

hebasto · 2026-06-13T18:14:27Z

 # endif

+# if !defined(_DEBUG) && !defined(__NO_INLINE__) && !defined(__OPTIMIZE_SIZE__)
+#  if defined(_MSC_VER)


Slightly unrelated (?) to this specific change:

On Windows "Release" builds, both cl.exe and clang-cl.exe hit this branch. However, for the SECP256K1_INLINE macro above, cl.exe uses the Microsoft-specific __inline while clang-cl.exe uses the standard inline.

We should probably make clang-cl.exe handle SECP256K1_INLINE and SECP256K1_FORCE_INLINE consistently, choosing either the MSVC extensions or the standard keywords for both.

l0rinc closed this May 30, 2026

l0rinc reopened this May 30, 2026

real-or-random added the performance label Jun 1, 2026

real-or-random reviewed Jun 2, 2026

View reviewed changes

Comment thread src/util.h Outdated

real-or-random added the tweak/refactor label Jun 2, 2026

l0rinc force-pushed the l0rinc/force-inline-5x52-mul-sqr branch from ac915c9 to 1c537ab Compare June 3, 2026 21:08

real-or-random approved these changes Jun 9, 2026

View reviewed changes

hebasto reviewed Jun 13, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

field: force-inline 5x52 mul and sqr#1859

field: force-inline 5x52 mul and sqr#1859
l0rinc wants to merge 1 commit into
bitcoin-core:masterfrom
l0rinc:l0rinc/force-inline-5x52-mul-sqr

l0rinc commented May 30, 2026 •

edited

Loading

Uh oh!

andrewtoth commented Jun 1, 2026 •

edited

Loading

Uh oh!

theStack commented Jun 2, 2026

Uh oh!

real-or-random commented Jun 2, 2026

Uh oh!

Uh oh!

sipa commented Jun 2, 2026

Uh oh!

real-or-random left a comment

Uh oh!

hebasto left a comment

Uh oh!

hebasto Jun 13, 2026

Uh oh!

l0rinc Jun 13, 2026

Uh oh!

hebasto Jun 13, 2026

Uh oh!

hebasto left a comment

Uh oh!

hebasto Jun 13, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Conversation

l0rinc commented May 30, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

andrewtoth, i9-14900HX, GCC 12.3

theStack, Snapdragon X Elite X1E-78-100, GCC 14.2.0

sipa, Ryzen 5950X, GCC 15.2.0

Uh oh!

andrewtoth commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Results (Min us, lower is better)

Uh oh!

theStack commented Jun 2, 2026

Uh oh!

real-or-random commented Jun 2, 2026

Uh oh!

Uh oh!

sipa commented Jun 2, 2026

Uh oh!

real-or-random left a comment

Choose a reason for hiding this comment

Uh oh!

hebasto left a comment

Choose a reason for hiding this comment

Uh oh!

hebasto Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

l0rinc Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

hebasto Jun 13, 2026

Choose a reason for hiding this comment

Uh oh!

hebasto left a comment

Choose a reason for hiding this comment

Uh oh!

hebasto Jun 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

l0rinc commented May 30, 2026 •

edited

Loading

andrewtoth commented Jun 1, 2026 •

edited

Loading

hebasto Jun 13, 2026 •

edited

Loading